Estimating the Proportion of True Null Hypotheses for Multiple Comparisons
نویسندگان
چکیده
Whole genome microarray investigations (e.g. differential expression, differential methylation, ChIP-Chip) provide opportunities to test millions of features in a genome. Traditional multiple comparison procedures such as familywise error rate (FWER) controlling procedures are too conservative. Although false discovery rate (FDR) procedures have been suggested as having greater power, the control itself is not exact and depends on the proportion of true null hypotheses. Because this proportion is unknown, it has to be accurately (small bias, small variance) estimated, preferably using a simple calculation that can be made accessible to the general scientific community. We propose an easy-to-implement method and make the R code available, for estimating the proportion of true null hypotheses. This estimate has relatively small bias and small variance as demonstrated by (simulated and real data) comparing it with four existing procedures. Although presented here in the context of microarrays, this estimate is applicable for many multiple comparison situations.
منابع مشابه
Towards Accurate Estimation of the Proportion of True Null Hypotheses in Multiple Testing
BACKGROUND Biomedical researchers are now often faced with situations where it is necessary to test a large number of hypotheses simultaneously, eg, in comparative gene expression studies using high-throughput microarray technology. To properly control false positive errors the FDR (false discovery rate) approach has become widely used in multiple testing. The accurate estimation of FDR require...
متن کاملA regression framework for the proportion of true null hypotheses
The false discovery rate is one of the most commonly used error rates for measuring and controlling rates of false discoveries when performing multiple tests. Adaptive false discovery rates rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here we propose a regression fra...
متن کاملEstimating the proportion of true null hypotheses, with application to DNA microarray data
We consider the problem of estimating the proportion of true null hypotheses, π0, in a multiple-hypothesis set-up. The tests are based on observed p-values. We first review published estimators based on the estimator that was suggested by Schweder and Spjøtvoll. Then we derive new estimators based on nonparametric maximum likelihood estimation of thep-value density, restricting to decreasing an...
متن کاملEstimating the Proportion of True Null Hypotheses under Dependence
Multiple testing procedures, such as the False Discovery Rate control, often rely on estimating the proportion of true null hypotheses. This proportion is directly related to the minimum of the density of the p-value distribution. We propose a new estimator for the minimum of a density that is based on constrained multinomial likelihood functions. The proposed method involves partitioning the s...
متن کاملOn efficient estimators of the proportion of true null hypotheses in a multiple testing setup
We consider the problem of estimating the proportion θ of true null hypotheses in a multiple testing context. The setup is classically modeled through a semiparametric mixture with two components: a uniform distribution on interval [0, 1] with prior probability θ and a nonparametric density f . We discuss asymptotic efficiency results and establish that two different cases occur whether f vanis...
متن کامل